Harden host-side sparse index width and allocation safety by LwhJesse · Pull Request #2822 · su2code/SU2

LwhJesse · 2026-05-25T21:30:17Z

Harden Host-Side Sparse Index Width and Allocation Safety

Summary

This PR hardens the host-side sparse linear algebra path against integer-width overflow and silent narrowing.

It introduces an explicit 64-bit host sparse-index alias,

using su2_index_t = std::uint64_t;

and uses it in the host sparse-pattern / CSR metadata path where index width and allocation size matter. It also adds checked multiplication for the main storage-size calculations and checked narrowing at the remaining external boundaries.

This PR does not modify the CUDA backend implementation itself. In particular, it does not change:

Common/src/linear_algebra/CSysMatrixGPU.cu
Common/include/linear_algebra/GPUComms.cuh

CUDA matvec correctness remains the responsibility of PR #2816.

Problem

Several sparse linear algebra paths in SU2 currently rely on unsigned long or int for sparse metadata, storage-size arithmetic, or boundary conversion.

On LP64 systems this often goes unnoticed because unsigned long is usually 64-bit. On LLP64 systems such as 64-bit Windows, unsigned long is still 32-bit. At sufficiently large local sparse-matrix sizes, that can lead to:

CSR metadata truncation
wrapped products such as nnz * nVar * nEqn
undersized matrix or vector allocations
wrapped block-value addressing
silent truncation when converting to external library integer types

For the main matrix storage, the critical host-side product is:

nnz_blocks * nVar * nEqn

If that is evaluated in 32-bit arithmetic, the largest representable unsigned value is:

2^32 - 1 = 4,294,967,295

So the first overflowing matrix-scalar count is:

4,294,967,296

For double, that means:

4,294,967,296 * 8 bytes = 34,359,738,368 bytes

which is:

about 32 GiB binary
about 34.36 GB decimal

That is the origin of the host-side estimate quoted above.

To connect that number to an actual sparse matrix, take a common square-block case with nVar = nEqn = 5. Then each nonzero block contributes 25 matrix scalars, so the first overflowing host-side case is:

nnz_blocks = 171,798,692
nVar = nEqn = 5

nnz_blocks * nVar * nEqn
= 171,798,692 * 25
= 4,294,967,300

This is already larger than 2^32 - 1 = 4,294,967,295, so the host-side 32-bit product has crossed the limit. In memory terms, that same case corresponds to:

4,294,967,300 * 8 bytes = 34,359,738,400 bytes

which is again about:

32 GiB binary
34.36 GB decimal

So the quoted 34.36 GB number is not a separate back-of-the-envelope estimate; it is simply the byte size of the first 5 x 5 double-precision sparse matrix whose nnz_blocks * nVar * nEqn product no longer fits in 32-bit unsigned arithmetic.

Using a rough relation nnz_blocks ~= N_local * z, with z the average nonzero-block count per row, this 5 x 5 threshold corresponds approximately to:

about 17.18 million local points at z = 10
about 8.59 million local points at z = 20

The same argument gives the corresponding first-overflow cases for larger square blocks:

6 x 6: 119,304,648 * 36 = 4,294,967,328
7 x 7: 87,652,394 * 49 = 4,294,967,306

So the threshold is high, but it is still a local sparse-matrix scale that can be reached in large implicit runs.

For reference, the current legacy CUDA kernel in develop can overflow earlier because it uses a signed 32-bit int matrix-scalar offset internally. The largest positive signed 32-bit value is:

2^31 - 1 = 2,147,483,647

so the first overflowing matrix-scalar count is:

2,147,483,648

For double, that means:

2,147,483,648 * 8 bytes = 17,179,869,184 bytes

which is:

about 16 GiB binary
about 17.18 GB decimal

That estimate is specific to the legacy custom CUDA matvec path currently used in develop.

PR #2816 naturally addresses that CUDA-side issue by replacing the old custom matvec kernel, including the legacy signed-int matrix-offset path that creates this earlier overflow limit. That is why this PR does not modify the CUDA backend itself.

What this PR changes

Introduces su2_index_t = std::uint64_t for the host sparse-index chain that needs it.
Propagates that type through graph-toolbox sparse-pattern aliases, host-side CSysMatrix sparse metadata, CSysVector size/index API, CPastixWrapper sparse input handling, and the sparse-pattern-facing geometry accessors required for that propagation.
Adds checked multiplication for the main host allocation-sensitive products, including:
- nnz * nVar * nEqn
- nnz_ilu * nVar * nEqn
- nPointDomain * nVar * nEqn
- numBlk * numVar
- numBlkDomain * numVar
Adds checked narrowing for PaStiX boundary conversion.
Keeps device sparse indices as unsigned long, but range-checks host sparse indices before converting them for the current CUDA upload path.

After this change, the targeted host-side sparse-index and allocation path no longer fails by silent wraparound or silent truncation. If a value exceeds the remaining boundary types, the code now fails explicitly through SU2_MPI::Error(...).

Validation

git diff --check passed.
The touched CPU/CUDA object files rebuilt successfully in the available build directories.
No CUDA backend implementation files were modified.

For numerical validation, I ran the existing six-case correctness harness with a two-way CPU comparison:

develop CPU
this branch CPU

Cases:

periodic2d_sector
udf_lam_flatplate_s
udf_lam_flatplate_m
udf_lam_flatplate_l
udf_test_11_probes_s
udf_test_11_probes_m

Result:

this branch CPU matched develop CPU in all 6 cases
the final common numeric fields in history.csv matched exactly in all 6 cases
max_abs_delta = 0.0 for every case

For CUDA, the relevant correctness discussion remains PR #2816, since this branch still uses the legacy pre-#2816 custom CUDA matvec implementation and this PR intentionally does not modify that backend. In particular, the current develop-side signed-int offset limit described above is handled naturally by the #2816 backend replacement rather than by this PR.

Notes on testing

This PR does not add a direct large-memory reproducer for the original host-side overflow. The original failure mode depends on LLP64-style 32-bit unsigned long arithmetic, and a faithful end-to-end reproduction would require a much larger local sparse structure than is practical for a routine test here.

PR Checklist

I am submitting my contribution to the develop branch.
My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
I used the pre-commit hook to prevent dirty commits and used pre-commit run --all to format old commits.
I have added a test case that demonstrates my contribution, if necessary.
I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.

LwhJesse added 2 commits May 26, 2026 05:27

Harden host sparse index width and allocation safety

e169734

Apply clang-format updates

ccf7a49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden host-side sparse index width and allocation safety#2822

Harden host-side sparse index width and allocation safety#2822
LwhJesse wants to merge 2 commits into
su2code:developfrom
LwhJesse:fix/host-sparse-index-overflow

LwhJesse commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LwhJesse commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Harden Host-Side Sparse Index Width and Allocation Safety

Summary

Problem

What this PR changes

Validation

Notes on testing

PR Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LwhJesse commented May 25, 2026 •

edited

Loading